feature dataset
Sparse Autoencoder Insights on Voice Embeddings
Pluth, Daniel, Zhou, Yu, Gurbani, Vijay K.
Recent advances in explainable machine learning have highlighted the potential of sparse autoencoders in uncovering mono-semantic features in densely encoded embeddings. While most research has focused on Large Language Model (LLM) embeddings, the applicability of this technique to other domains remains largely unexplored. This study applies sparse autoencoders to speaker embeddings generated from a Titanet model, demonstrating the effectiveness of this technique in extracting mono-semantic features from non-textual embedded data. The results show that the extracted features exhibit characteristics similar to those found in LLM embeddings, including feature splitting and steering. The analysis reveals that the autoencoder can identify and manipulate features such as language and music, which are not evident in the original embedding. The findings suggest that sparse autoencoders can be a valuable tool for understanding and interpreting embedded data in many domains, including audio-based speaker recognition.
Learning Wireless Data Knowledge Graph for Green Intelligent Communications: Methodology and Experiments
Huang, Yongming, You, Xiaohu, Zhan, Hang, He, Shiwen, Fu, Ningning, Xu, Wei
Intelligent communications have played a pivotal role in shaping the evolution of 6G networks. Native artificial intelligence (AI) within green communication systems must meet stringent real-time requirements. To achieve this, deploying lightweight and resource-efficient AI models is necessary. However, as wireless networks generate a multitude of data fields and indicators during operation, only a fraction of them imposes significant impact on the network AI models. Therefore, real-time intelligence of communication systems heavily relies on a small but critical set of the data that profoundly influences the performance of network AI models. These challenges underscore the need for innovative architectures and solutions. In this paper, we propose a solution, termed the pervasive multi-level (PML) native AI architecture, which integrates the concept of knowledge graph (KG) into the intelligent operational manipulations of mobile networks, resulting in the establishment of a wireless data KG. Leveraging the wireless data KG, we characterize the massive and complex data collected from wireless communication networks and analyze the relationships among various data fields. The obtained graph of data field relations enables the on-demand generation of minimal and effective datasets, referred to as feature datasets, tailored to specific application requirements. Consequently, this architecture not only enhances AI training, inference, and validation processes but also significantly reduces resource wastage and overhead for communication networks. To implement this architecture, we have developed a specific solution comprising a spatio-temporal heterogeneous graph attention neural network model (STREAM) as well as a feature dataset generation algorithm. Experiments are conducted to validate the effectiveness of the proposed architecture.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (4 more...)
- Energy (0.93)
- Telecommunications (0.93)
- Information Technology (0.67)
Preparing the Features Dataset using Amibroker Exploration - Machine Learning
There are various methods to prepare the feature dataset, which is a crucial input for a machine learning prediction model. One approach is to code technical indicators using Python and feed them as input to the model. Another simpler approach is to utilize Amibroker's AFL Exploration, which provides built-in indicators and also supports custom indicators that are easy to code, explore, and prepare as feature datasets. The exploration dataset can then be extracted in CSV format. In machine learning, features represent the input data points or independent variables used to describe various aspects of the object under study.
Minerva: A File-Based Ransomware Detector
Hitaj, Dorjan, Pagnotta, Giulio, De Gaspari, Fabio, De Carli, Lorenzo, Mancini, Luigi V.
Ransomware is a rapidly evolving type of malware designed to encrypt user files on a device, making them inaccessible in order to exact a ransom. Ransomware attacks resulted in billions of dollars in damages in recent years and are expected to cause hundreds of billions more in the next decade. With current state-of-the-art process-based detectors being heavily susceptible to evasion attacks, no comprehensive solution to this problem is available today. This paper presents Minerva, a new approach to ransomware detection. Unlike current methods focused on identifying ransomware based on process-level behavioral modeling, Minerva detects ransomware by building behavioral profiles of files based on all the operations they receive in a time window. Minerva addresses some of the critical challenges associated with process-based approaches, specifically their vulnerability to complex evasion attacks. Our evaluation of Minerva demonstrates its effectiveness in detecting ransomware attacks, including those that are able to bypass existing defenses. Our results show that Minerva identifies ransomware activity with an average accuracy of 99.45% and an average recall of 99.66%, with 99.97% of ransomware detected within 1 second.
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
- Europe > Italy > Lazio > Rome (0.04)
Introduction to Machine Learning: K Means - PythonAlgos
Welcome to our fourth installment on Machine Learning. In this module we're going to cover K-Means. K-Means is a clustering algorithm based on the hyperparameter "K" which dictates how many clusters there will be. A hyperparameter is just a parameter that we can adjust. Each cluster has a "centroid" or a central point that will be the anchor of our cluster.
Hypergraph convolutional neural network-based clustering technique
Tran, Loc H., Trinh, Nguyen, Tran, Linh H.
This paper constitutes the novel hypergraph convolutional neural networkbased clustering technique. This technique is employed to solve the clustering problem for the Citeseer dataset and the Cora dataset. Each dataset contains the feature matrix and the incidence matrix of the hypergraph (i.e., constructed from the feature matrix). This novel clustering method utilizes both matrices. Initially, the hypergraph auto-encoders are employed to transform both the incidence matrix and the feature matrix from high dimensional space to low dimensional space. In the end, we apply the k-means clustering technique to the transformed matrix. The hypergraph convolutional neural network (CNN)-based clustering technique presented a better result on performance during experiments than those of the other classical clustering techniques.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.06)
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- (2 more...)
Survival Prediction of Children Undergoing Hematopoietic Stem Cell Transplantation Using Different Machine Learning Classifiers by Performing Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis
Ratul, Ishrak Jahan, Wani, Ummay Habiba, Nishat, Mirza Muntasir, Al-Monsur, Abdullah, Ar-Rafi, Abrar Mohammad, Faisal, Fahim, Kabir, Mohammad Ridwan
Bone Marrow Transplant, a gradational rescue for a wide range of disorders emanating from the bone marrow, is an efficacious surgical treatment. Several risk factors, such as post-transplant illnesses, new malignancies, and even organ damage, can impair long-term survival. Therefore, technologies like Machine Learning are deployed for investigating the survival prediction of BMT receivers along with the influences that limit their resilience. In this study, an efficient survival classification model is presented in a comprehensive manner, incorporating the Chi-squared feature selection method to address the dimensionality problem and Hyper Parameter Optimization (HPO) to increase accuracy. A synthetic dataset is generated by imputing the missing values, transforming the data using dummy variable encoding, and compressing the dataset from 59 features to the 11 most correlated features using Chi-squared feature selection. The dataset was split into train and test sets at a ratio of 80:20, and the hyperparameters were optimized using Grid Search Cross-Validation. Several supervised ML methods were trained in this regard, like Decision Tree, Random Forest, Logistic Regression, K-Nearest Neighbors, Gradient Boosting Classifier, Ada Boost, and XG Boost. The simulations have been performed for both the default and optimized hyperparameters by using the original and reduced synthetic dataset. After ranking the features using the Chi-squared test, it was observed that the top 11 features with HPO, resulted in the same accuracy of prediction (94.73%) as the entire dataset with default parameters. Moreover, this approach requires less time and resources for predicting the survivability of children undergoing BMT. Hence, the proposed approach may aid in the development of a computer-aided diagnostic system with satisfactory accuracy and minimal computation time by utilizing medical data records.
- North America > United States > California > Orange County > Irvine (0.04)
- Asia > Japan (0.04)
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Hematology > Stem Cells (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Predicting The Wind Speed Using K-Neighbors Classifier
Hope you all are doing well in this hard time of the Covid era. In this article, we are going to predict the wind speed of the current date and time for any given latitude and longitude coordinates. We'll be using a K-neighbors classifier to build our predicting model. The dataset we are using is available on GitHub here. The first step which I always suggest is to check the python version which you are using.
Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models
Zhang, Tianyu, Zhang, Liwei, Payne, Philip R. O., Li, Fuhai
Drug resistance is still a major challenge in cancer therapy. Drug combination is expected to overcome drug resistance. However, the number of possible drug combinations is enormous, and thus it is infeasible to experimentally screen all effective drug combinations considering the limited resources. Therefore, computational models to predict and prioritize effective drug combinations is important for combinatory therapy discovery in cancer. In this study, we proposed a novel deep learning model, AuDNNsynergy, to prediction drug combinations by integrating multi-omics data and chemical structure data. In specific, three autoencoders were trained using the gene expression, copy number and genetic mutation data of all tumor samples from The Cancer Genome Atlas. Then the physicochemical properties of drugs combined with the output of the three autoencoders, characterizing the individual cancer cell-lines, were used as the input of a deep neural network that predicts the synergy value of given pair-wise drug combinations against the specific cancer cell-lines. The comparison results showed the proposed AuDNNsynergy model outperforms four state-of-art approaches, namely DeepSynergy, Gradient Boosting Machines, Random Forests, and Elastic Nets. Moreover, we conducted the interpretation analysis of the deep learning model to investigate potential vital genetic predictors and the underlying mechanism of synergistic drug combinations on specific cancer cell-lines.
- Asia > China > Liaoning Province > Dalian (0.04)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)